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Abstract 

The hierarchically orthogonal functional decomposition of any measurable 
function rj of a random vector X = (X±, • ■ • , X p ) consists in decomposing ry(X) 
into a sum of increasing dimension functions depending only on a subvector 
of X. Even when X\,--- ,X p are assumed to be dependent, this decompo- 
sition is unique if components are hierarchically orthogonal. That is, two of 
the components are orthogonal whenever all the variables involved in one of 
the summands are a subset of the variables involved in the other. Setting 
Y = ??(X), this decomposition leads to the definition of generalized sensitivity 
indices able to quantify the uncertainty of Y with respect to the dependent 
inputs X [9]. In this paper, a numerical method is developed to identify the 
component functions of the decomposition using the hierarchical orthogonality 
property. Further, the asymptotic properties of the components estimation is 
studied, as well as the numerical estimation of generalized sensitivity indices 
though a toy model. In addition, a model coming from real world illustrates 
the interest of the method. 
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1. INTRODUCTION 



In a nonlinear regression model, input parameters can be subject to many sources 
of uncertainty. The objective of global sensitivity analysis is to identify and to rank 
the input variables that drive the uncertainty of the model output. The most pop- 
ular methods are the variance-based ones [25]. Among them, the Sobol indices are 
widely used [26]. This last method stands on the assumption that the incomes are 
independent. Under this assumption, Hoeffding [13] shows that the model output 
can be uniquely decomposed into a sum of increasing dimension functions, where 
the integrals of every summand over any of its own variables must be zero. A conse- 
quence of these conditions is that all summands of the decomposition are mutually 
orthogonal. Using this decomposition, Sobol shows that the global variance can also 
be decomposed as a sum of partial variances. Thus, the so-called Sobol sensitivity 
index for a group of inputs is the ratio between the partial variance associated to 
these inputs and the global variance [26]. However, in models with dependent inputs, 
the use of Sobol indices may lead to a wrong interpretation because the sensitivity 
induced by the dependence between two factors is implicitly included in their Sobol 
indices. To handle this problem, a naive solution can consist in computing Sobol sen- 
sitivity indices for independent groups of dependent variables. First introduced by 
Sobol [26], this idea is exploited in practice by Jacques et al. [T5]. Nevertheless, this 
technique implies to work with models having several independent groups of inputs. 
Furthermore, it does not allow to quantify the individual contribution of each input. 
A different way to deal with this issue has been initiated by Borgonovo et al. [2 [3]. 
These authors define a new measure based on the joint distribution of (Y,X). The 
new sensitivity indicator of an input X$ measures the shift between the output dis- 
tribution and the same distribution conditionally to . This moment free index has 
many properties and has been applied to some real applications [SJE]- However, the 
dependence issue remains unsolved as we do not know how the conditional distri- 
bution is distorted by the dependence, and so how it impacts the sensitivity index. 
Another idea is to use the Gram-Schmidt orthogonalization procedure. In an early 
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work, Bedford [T] suggests to orthogonalize the conditional expectations and then 
to use the usual variance decomposition on this new orthogonal collection. He then 
uses the Monte Carlo simulation to compute the indices. In the same spirit, Mara 
et al. [21] use the same Gram-Schmidt tool to decorrelate the inputs, but perform 
polynomial regressions to approximate the model. In both papers, the decorrelation 
method depends on the ordering of the variables, making the procedure computa- 
tionally expensive and difficult to interpret. 

Following the construction of Sobol indices previously exposed, Xu et al. [31] propose 
to decompose the partial variance of an input into a correlated and an uncorrelated 
contribution in the context of linear models. This last work has been later extended 
by Li et al. with the concept of HDMR [191 12Q]. In [19], the authors suggest to 
reconstruct the model function via classical basis (polynomials, splines,...), then to 
deduce the decomposition of the response variance as a sum of partial variances 
and covariances. Instead of classical basis, Caniou et al. [8] use a polynomial chaos 
expansion to approximate the initial model as far as the copula theory to model the 
dependence structure [22] , Thus, in all these papers, the authors choose a type of 
model reconstruction before proceeding to the splitting of the response variance. 
In a previous paper [9], we revisit the Hoeffding decomposition in a different way, 
bringing a new definition in the case of dependent inputs. Inspired by the pioneering 
work of Stone |27j and Hooker p3], we show, under a weak assumption on the inputs 
distribution, that any model function can be decomposed into a sum of hierarchically 
orthogonal component functions. This means that two of these summands are or- 
thogonal whenever all variables included in one of the components are also involved 
in the other. The decomposition leads to generalized Sobol sensitivity indices able 
to quantify the uncertainty brought by dependent inputs on the model. 
The goal of this paper is to complete the work done in [9] by providing an efficient 
numerical method for the estimation of the generalized Sobol sensitivity indices. In 
our previous paper [9], we have proposed a statistical procedure based on projection 
operators to identify the components of the hierarchically orthogonal functional 
decomposition (HOFD). The method consists in projecting the model output onto 
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constrained spaces to obtain a functional linear system. The numerical resolution 
of these systems relies on an iterative scheme that requires to estimate conditional 
expectations at each step. On one hand, this method is well tailored for independent 
pairs of dependent variables models. On the other hand, it is difficult to apply to 
more general models because of its computational cost. Hooker p3] has also worked 
on the estimation of the HOFD components. This author studies the component 
estimation via a minimization problem under constraints using a sample grid. In 
general, this procedure is also quite computational demanding. Moreover, it requires 
to get a prior on the inputs distribution at each evaluation point, or, at least, to 
be able to estimate them properly. In a recent article, Li et al. [18] come back on 
Hooker's work and also identify the HOFD components by a least-squares method. 
They propose to approximate these components using their expansions on suitable 
basis. They bypass some technical problem of degenerate design matrix by using a 
continuous descent technique [TT] . 

In this paper, we propose an alternative to directly construct a hierarchical orthogo- 
nal basis. Inspired by the usual Gram-Schmidt algorithm, the procedure consists in 
recursively constructing for each component a multidimensional basis that satisfies 
the hierarchical orthogonal conditions. This procedure will be referred as the Hierar- 
chical Orthogonal Gram-Schmidt (HOGS) procedure. Then, each component of the 
decomposition can be properly estimated by a linear combination of this basis. The 
coefficients are then estimated by the usual least-squares method. Thanks to the 
HOGS procedure, we show that the design matrix has full rank, so the minimization 
problem admits a unique and explicit solution. Further, we study the asymptotic 
properties of the estimated components. Nevertheless, the practical estimation of 
the one-by-one component suffers from the curse of dimensionality when using the 
ordinary least-squares estimation. To handle this problem, we propose to estimate 
parameters of the model using variable selection methods. Two usual algorithms are 
briefly presented, and are adapted to our method. Further, the HOGS procedure 
coupled with these algorithms is experimented on numerical examples. 
The paper is organized as follows. In Section [2J we give and discuss the general 
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results on the HOFD. We remind Conditions (jC.lj) and (|C.2j) under which the HOFD 
is available. Further, we give the generalized Sobol sensitivity indices, and discuss 
their interpretation. Section [3] is devoted to the HOGS procedure. We introduce the 
appropriate notation, and give a detailed procedure. In Section^ we adapt the least- 
squares estimation to our problem, and we point out the curse of dimensionality. 
Further, we discuss about variables selection via a penalized minimization. Section [5] 
brings asymptotic results on the component estimators, constructed according to the 
HOGS procedure of Section [3l In Section [6l we present numerical applications. The 
first example is a toy function, and its objective is to show the efficiency of the HOGS 
procedure coupled with variable selection methods in the sensitivity estimation. The 
last example concerns the pressure applied to a tank. In this example, we want to 
detect the most influent inputs in the model with our procedure. 

2. GENERALIZED SOBOL SENSITIVITY INDICES 

Functional ANOVA models are specified by a sum of functions depending on an 
increasing number of variables. A functional ANOVA model is said to be additive if 
only main effects are included in the model. It is said to be saturated if all interaction 
terms are included in the model. However, the existence and the uniqueness of 
such decomposition is ensured by some identifiability constraints. When the inputs 
are independent, any regular model function is exactly a saturated ANOVA model 
with pairwise orthogonal components, as reminded in the introduction. It results 
that the contribution of any group of variables onto the model is measured by the 
Sobol index, bounded between and 1. Moreover, the Sobol indices are summed 
to 1 [26j. The use of such an index is not excluded in the dependence context, 
but the information due to the dependence is considered several times. This could 
lead to a wrong interpretation of the Sobol indices. In this section, we remind the 
main results established in Chastaing et al. [9] when inputs can be non-independent. 
In this case, the saturated ANOVA model is established with weaker identifiability 
constraints than for the independent case. This leads to a generalization of the 
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Sobol indices well suited to perform global sensitivity analysis when the inputs are 
not independent. 

First, we remind the general context and notation. The last part is dedicated to 
the generalization of the Hoeffding-Sobol decomposition when inputs are potentially 
dependent. The definition of the generalized sensitivity indices follows. 

2.1. First Settings 

We denote by C the strict inclusion, that isAcB=>AnB^B, whereas we use 
C when equality is possible. 

Consider a measurable function r\ of a random vector X = (Xi, • • • , X p ) G W, p > 1, 
and let Y be the real- valued response variable defined as 



X t-> ? ? (X) 

where the joint distribution of X is denoted by Px- For a (T— finite measure v on 

(MP, B(W)), we assume that Px «v and that X admits a density px with respect 

, . dP x 

to v, that is px = — ; — • 
dv 

Also, we assume that rj G Lj|(R p , B(W), Px)- As usual, we define the inner product 
(-, •) and the norm || • || of the Hilbert space L^(W, B(W),P X ) as 

{hi,h 2 ) = f fci(x)Mx)px<Mx) =E(/n(X)/ l2 (X)), /n,/i 2 g4(M p ,S(M p ),P x ) 
(/i, fe) = E(/ l (X) 2 ), h G L|(RP, P x ) 



Here E(-) denotes the expectation. Further, V(-) = E[(- — E(-)) 2 ] denotes the vari- 
ance, and Cov(-, *) = E[(- — E(-))(* — E(*))] the covariance. 

Let us denote [1 : k) := {1,2, • • • ,k}, V k € N*, and let S be the collection of all 
subsets of [1 : p). As misuse of notation, we will denote the sets {i} by i, and {ij} 
by ij. For u G S with u = {u±, ■ ■ ■ ,ut}, we set the cardinality of u as \u\ = t and 
the random subvector X u := (X U1 , ■ ■ ■ ,X Ut ). Conventionally, if u = 0, \u\ = 0, and 
X$ = 1. Also, we denote by X_ u the complementary vector of X u (that is, — u 
is the complementary set of u). The marginal density of X u (respectively X_ u ) is 
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denoted by p Xu (resp. px_J- 

Further, the mathematical structure of the functional ANOVA models is defined 
through subspaces (H u ) ueS and (H°) ueS of L^(W,B(W), P x ). H % = H$ denotes 
the space of constant functions. For u € 5'\{0}, H u is the space of square-integrablc 
functions that depend only on X u . The space is defined as: 

H Q U = {h u e H u , (h u , h v ) = o, V v c u,v K e H° v } = H u n (^ H ^j 

\vCu / 

Through the article, we will see that ??(X) can be written as a functional ANOVA 
model in terms of low-order components. We define d := max u |it| the order of a 
functional ANOVA model. Thus, if the ANOVA model is additive, d = 1. If it is a 
saturated model, the order is maximal, and d = p. 

2.2. Generalized Sobol Sensitivity Indices 

Let us suppose that 



Px « v 



where 



u{dx) = vi{dx{) 



Vp(dx p ) 



(C.l) 



Our main assumption is : 



3 < M < 1, V u C [1 : p], p x >M- p Xu Px_ u v-a.e. 



(C.2) 



Under these conditions, the following result states a general decomposition of rj as a 
saturated functional ANOVA model, under the specific conditions of the spaces H® 
(defined in Section 12. 1|) , 

Theorem 1. Let n be any function in B(W), P x )- Then, under [Ul\) and 

iC. there exist functions r)o,rji, ■ ■ ■ , ??{!,... >p } G #0 x #i x • • • ii"^ , suc/i i/iai 
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the following equality holds : 



m + J2vi(Xi)+ r Hj (X i ,X j ) + 



+ ?7{i ! - > p}(X) 



(1) 



Moreover, this decomposition is unique. 



To get a better understanding of Theorem [IJ the reader could refer to its proof and 
further explanations in [9]. Notice that, unlike the Sobol decomposition with inde- 
pendent inputs, the component functions of ([ID are hierarchically orthogonal, and 
no more mutually orthogonal. Thus, further along the article, the obtained decom- 
position ([I]) will be abbreviated HOFD (for Hierarchically Orthogonal Functional 
Decomposition). Also, as mentioned in [S], the HOFD is said to be a generalized 
decomposition because it turns out to be the usual functional ANOVA decomposi- 
tion when incomes are independent. 

The general decomposition of the output Y = n(X.) given in Theorem Q] allows for 
decomposing the global variance as a simplified sum of covariance terms. Further 
below, we define the generalized sensitivity indices able to measure the contribution 
of any group of inputs in the model when inputs can be dependent : 

Definition 1. The sensitivity index S u of order \u\ measuring the contribution of 
X u into the model is given by : 



FMXu)) + Eunv^v Cov( Vu (X u ), Vv (X v )) 



V{Y) 



(2) 



More specifically, the first order sensitivity index Si is given by : 



Si 



V(Vi(Xi)) + £^0 Cov( Vl (Xi), Vv (X v )) 

W) 



(3) 



These indices are called generalized Sobol sensitivity indices because if all inputs are 
independent, it can be shown that Cov(r/ u , r] v ) = 0, V u v 0. 



Proposition 1. Under liC. 1\) and liC.S\) . the sensitivity indices S u previously defined 
sums to I, i.e. E u6 S\{0} S v = L 



Interpretation of the sensitivity indices 

As the covariance term Cov(r] u ,^2 unv _ iuv r] v ) in S u can be negative, S u is no more 
bounded between and 1 as in the independent case. Hence, the interpretation 
of our indices here is much less obvious. However, we could interprete a sensitiv- 
ity index S u given by ([2]) as follows. The form of a sensitivity index S u allows for 
distinguishing two parts: the first part, V(rj u )/V(Y) could be identified as the full 
contribution of X u , whereas the second part, Cov(t?. u , Ylunv^u v r] v )/V(Y) could be 
interpreted as the contribution induced by the dependence with the other terms of 
the decomposition. Thus, the covariance terms would play the role of compensation. 
If S u > 0, the covariance contribution strengthens the variance term if it is positive, 
whereas it weakens the full contribution if not. In the case of a negative sensitivity 
index, the contribution induced by the dependence dominates in the index, that 
would show a small full contribution of the variable itself with respect to the total 
sum of covariances. As an illustration, we consider a model with p = 2 inputs. If 
the component i]\ is strongly negatively correlated with ij2, the variations of r/i is 
going to impact on the variations of r/2- Thus, it may result a negative index. This 
tells us about the importance of the dependent part here. Nevertheless, the covari- 
ance contribution is the same in S\ and S^- Thus, if S± is greater than S2, the full 
contribution of S\ will be bigger than the one in 52- In this case, we are able to 
rank the input parameters. 

However, these results are not new, as they are developed in [9]. As a continuity of 
this article, we propose here to estimate the functional ANOVA components in the 
second part. In Theorem [TJ we show that each component r] u belongs to a subspace 
H® of L 2 (W P , B(M. P ), -Px)- Thus, to estimate r] u , the most natural approach is to 
construct a good approximation space of H®. The next section aims at proposing a 
procedure to construct such a space. 
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3. THE HIERARCHICAL ORTHOGONAL GRAM-SCHMIDT 

PROCEDURE 



In Section [21 we have seen that the generalized sensitivity indices are defined for any 
type of reference measures (^j)ig[i:p]- From now and until the end, we will assume 
that Vi, V i £ [1 : p], are diffuse measures. Indeed, the non diffuse measures raise ad- 
ditional issues in the results developed further that we will not address in this paper. 



In a Hilbert space, it is usual to call in an orthonormal basis to express any of the 
space element as a linear combination of these basis. Further below, we will define 
the finite-dimensional spaces C H u and Hu L C H®, V u £ S, as linear spans of 
some orthonormal systems that will be settled later. We take the notation Span {B} 
to define the set of all finite linear combination of elements of B, also called the linear 
span of B. 

Consider, for any i £ [1 : p] , a truncated orthonormal system [ipj. of I? (R, £>(R) , ) , 
with Li > 1. Further, let us denote the vector of sizes as L = t {L\, • • • , L p ). Also, we 
denote by l u = ■ ■ ■ , Z^, • • • )j eu the multi-index associated to the tensor-product 
of )• Hence, l u £ x [1 : Li], where x [1 : Li] denotes the Cartesian product 

of the set [1 : Li], for i £ u. To define properly the truncated spaces C H u , we 
need first to assume that 



V u £ S, V k £ [1 : Li], J(n te u^(^)) 2 Px^(x) < +oo (C.3) 



Remark. A sufficient condition for hC.2\l is to have < M\ < px < M2 (see Section 
3 of In this particular case, it is enough to assume that f (Y\ie[i- P ] V'z i ( ;:c i)) 2 ^ z/ ( x ) < 
+00 to warrant liC.3\) . 

Under (|C.3p . we define, = Span{l}. Also, we set, Vt^j€[l:p], 

= Span{l,^i,-" ^IJ 
flg = Span{l,^i,-.- ^i.^i^V-iV-- ,^4®^} 
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where \ for k*i> = l {ij}) G t 1 : x t 1 : L i\ is the tensor 

product orthonormal system of Hf* (g) . More generally, for any u such that 

|u| > 1, we set 

Hu = Span I l,(®»ei;^)l.=('i)e x [i:^]' V w - u ' v ^ f 
Notice that dim(i?^) = 1 + ^2 v c u Y\i^ v Li- Now, we define the corresponding spaces 
up to the hierarchical orthogonality constraints. First, H^' = H^. For u € S\ {0}, 

H° U ' L = {h u e Ht (h u , h v ) = o,\/vcu,v h v e H?> L } 

From now, we denote by L u the dimension of Hu ,L , for u € S \ {0}. By definition 
of H° U > L , we get L u = dim(#£) - [£,c« EU ^ + 1] = Iliet, ^- 

Suppose that we observe an independent and identically distributed sample (y s , x s ) s= i r .. >n 
of size n from the distribution of (Y, X). We define the empirical inner product (•, •)„ 
and norm || • || n as 

1 - 

<ffi,52) n = - Y]5i(x s )52(x s ), = (5,5}n 

We define the finite-dimensional linear subspaces (Gu'n) u es as the approximating 
spaces of (Hu' L ) u es, when the scalar product (•, •) is replaced by the empirical one. 
First, GJlf = H^, and, for u € 5 \ {0}, 

G°u L n = {g u € = 0,y v CZ u,V g v G } 

Now we have determined (Hu ,L ) U £S an d we want to build them. In 

the next procedure, we propose an iterative scheme to construct them, taking into 
account their specific properties of orthogonality. 

HOGS Procedure 

1. Initialization: For any i G [1 : p], take a truncated orthonormal system 
WJk=o> L i ^ X > of L 2 (R,B(R),P Xz ) such that % = 1. Set <f>\. = 
V l{ > 1 and i^j '^ = iSpan { 0| , • • • ,4> l L .}. As {4>\.)^ z = i is an orthonormal 
system, h(Xi) = Ti-=i i x i) satisfies E(^(Xi)) = 0. 
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2. To build a basis x [i-la °f Hu ,L , with \u\ = k, we proceed recursively 

on \u\. Suppose that, for any v € S such that 1 < \v\ < k — 1, we get 

H°> L = Span { <f>l , l v = (P v ) e x[l:Li]\, L v := dim{H^ L ) = TT L t 



To construct {4>f u )i u £ x [i-.Li]> with u := {u\, ■ ■ ■ , Uk}, we proceed as follows: 
for all ,l^)e[l:L Ul ]x---x[l:L Uk ], 

(a) set 

C(x u ) = (<^\ ®- • -®^4)(x u )+E E ^AC( x v)+cf„ (4) 

»C«|.6 X [l:Li] 

(b) compute the (l + Y,v<zuL v ) coefficients (C, {\\ , ) lv& x [ 1:Li ] )VCu ) by solv- 
ing 



uy M ) = 0, VdCm, VI t e x[l: 

iet> (5) 

Wii/i |^p, i/ie linear system is equivalent to a sparse matrix system 
of the form A l ^\ u = D lu , when Cj^ has been removed. The matrix 
is a Gramian matrix involving terms E($„ 1 (X. Vl ) l & V2 (X V2 ))„ lj „ 2Cu , with 
($ Vi (X Vl )) lv . = (X Vi ),Z„. G x [1 : Lj\), i = 1,2. A n tnvo/ve* toe 
terms (Xi v lu )i v e x [i:Li],«c«j and involves -E(®$ =1 </>£$ Vi ) ViCu . in 
Lemma [0, we s/iow i/iai ^ is a definite positive matrix, so the system 
admits a unique solution. 

Set Hu' L = Span < </>% , l u € x [1 : Lj] L uwtfi L n := dim(Hu' L ) = \[ k i=l L Ui . 
[ u i&i ) 

The construction of (Gu,n)ueS is ver y similar to the (Hu' L ) u ^s one. However, as 
the spaces (Gu,n)ueS depend on the observed sample, their construction requires 
to assume that the sample size n is larger than the sizes Li, i & [1 : p\. To build 
G®'^ , V i € [1 : p], we use the usual Gram-Schmidt procedure on (,4'i i )i'.=i *° 
an orthornormal system with respect to the empirical inner product (•, -) n . 

To build (Gu,n)ueS,\u\>2i it is enough to replace (•, •} by (-, •)„ and to use the HOGS 
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procedure. At the end, we denote Gu,n = Span < ipf n , l u 6 X [1 : Lj] >, V u G 
5\{0}. 

In practice, polynomials or splines basis functions [10] will be considered. Also, 
for practical reasons and motivation exposed in Section [51 we will approximate the 
model output by an ANOVA model of order at most d = 3. In the next section, we 
discuss the practical estimation of generalized Sobol sensitivity indices using least- 
squares minimization. Further, we also discuss the curse of dimensionality, and 
propose some variable selection methods to handle it. 

4. ESTIMATION OF THE GENERALIZED SENSITIVITY 

INDICES 

4.1. Least-Squares Estimation 

The effects (j] u ) ueS in the HOFD flD satisfy 

( Vu ) ue3 = Arg ■min E[(Y - V t? u (X u )) 2 ] (6) 

- rrO «GS 

Notice that 770, the expected value of Y, has not interest for the sensitivity indices 
estimation. Thus, Y := Y — E(Y) replaces Y in Also, the residual term rjri ... >p \ 
is removed from ([6]) and it is estimated afterwards. In Section [3j we defined the 
approximating spaces Gu,n of H®, for u £ S \ {0}. Thus, the minimization problem 
(0) may be replaced by its empirical version, 

2 



min — 



y s - E E A>L>u s ) 



uC[l:p]I u e x [l-.Li] 



(7) 



where y s := y s — y, y '■= ^ Yls=i V s ■> an< ^ wnere every subspace G u ' n is spanned by 
basis functions (iff n )j ug x [ula constructed according to the HOGS Procedure of 
Section [3) The equivalent matrix form of (J7J) is 



rnin||Y-X^ (8) 
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where Y s = y s -y,X v = ... v ... € x Ai„ ;i „(R), where, x M n , iu (K) 

denotes the cartesian product of real entries matrices with n rows and L u columns. 
For ueS, {^ u ) sU = cftjxu 8 ), V * E [1 : n], V l u E x [1 : L,]. (/3) Jtl , u = 
V/ u € x [l:L;],VuC [1 : p], u / 0. 

Remind that the dimension of spaces H®' is denoted by L u = Y\i eu Li, ^ u ^ 
S \ {0}. Thus, the number of parameters to be estimated in ([S]) is equal to m : = 
Y2uc[l:p] Lu- Let us remark that it would be numerically very expensive to consider 
the estimation of all these coefficients. Even for small ANOVA order d, the number 
of terms blows up with the dimensionality of the problem, and so does the number 
of model evaluations when using an ordinary least-squares regression scheme. As an 
illustration, take d = 3, p = 8 and Lj = L = 5, V i 6 [1 : p\. In this case, m = 7740 
parameters are to be estimated, which could be a difficult task in practice. To handle 
this problem, many variable selection methods have been considered in the field of 
statistics. The next section aims at briefly exposing the variable selection methods 
via a penalized regression. We particularly focus on the £q penalty [28] and on the 
Lasso regression [29] , 

4.2. The Variable Selection Methods 

For simplicity, we denote by m the number of parameters in ([8|). The variable 
selection methods usually deal with the penalized regression 

mm||Y-X^||2 + AJ(/3) (9) 

where J(-) is positive valued for (3 ^ 0, and where A > is a tuning parame- 
ter. The most intuitive approach is to consider the ^o-P ena hy J(&) = ||/3||o> where 
||/3|| = Y^j=i^-(.Pj 7^ 0)- Indeed, the £q regularization aims at selecting nonzero 
coefficients, thus at removing the useless parameters from the model. The greedy 
approximation [28] offers a series of strategies to deal with the £o-P ena lty. Neverthe- 
less, the £o regularization is a non convex function, and suffers from the statistical 
instability, as mentioned in [UJ [2?J] . A convex relaxation of the optimization problem 
can be viewed with the Lasso regression [29J. Indeed, the Lasso regression corre- 
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sponds to the ^-penalty, i.e. © with J(fl) = and = Ylj=i \Pj\- The 

Lasso offers a good compromise between a rough selection of nonzero elements, and 
a ridge regression ( J(j3) = X]j=i Pj) that only shrinks coefficients, but is known to 
be stable [7J [12]. To offer a good panel to the reader, we will adapt our method to 
the £o and to the l\ regularization. 

The adaptive forward-backward greedy (FoBa) algorithm proposed in Zhang [32] is 
exploited here to deal with the £q penalization. From a dictionary T> that can be 
large and/or redundant, the FoBa algorithm is an iterative scheme that sequentially 
selects and deletes the element of T> that has the least impact on the fit. The aim of 
the algorithm is to efficiently select a limited number of predictors. The advantage 
of such approach is that it is very intuitive, and easy to implement. In our problem, 
the FoBa algorithm is applied on the whole set of basis functions. It can then hap- 
pen that none basis function is retained for the estimation of a HOFD component. 
In this case, as we want to estimate each component of the HOFD, the coefficient 
corresponding to this component is set to be zero. 

Initiated by Efron et al. [llj . the modified LARS algorithm is further adapted to our 
problem to deal with the Lasso regression. The LARS is a general iterative tech- 
nique that builds up the regression function by successive steps. The adaptation of 
LARS to Lasso (the modified LARS) is inspired by the homotopy method proposed 
by Osborne et al. [23] , The main advantage of the modified LARS algorithm is 
that it builds up the whole regularized solution path {/3(A), A € R}, exploiting the 
property of piecewise linearity of the solutions with respect to A [71 124] . 
In the next part, both FoBa algorithm and the modified LARS algorithm are adapted 
to our problem and they are compared via numerical examples. 

4.3. Summary of the Estimation Procedure 

Provided an initial good choice of orthonormal systems (^)^.Lo ie[i-p]' we ^ rs ^ con " 
struct the approximating spaces Gu,n of H® for \u\ < d, for d < p, thanks to the 
HOGS Procedure of Section [3j A HOFD component rj u is then a projection onto 
Gu,n, whose coefficients are defined by least-squares estimation. To bypass the curse 
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of dimensionality, the FoBa algorithm or the modified LARS algorithm is used. 
Once the HOFD components are estimated, we deduce the empirical estimation of 
the generalized Sobol sensitivity indices given in Definition [TJ 



5. ASYMPTOTIC RESULTS 

In this section, we then assume that a functional ANOVA of order d, with d <^ p, 
substitutes properly the initial model. Let us denote S d the collection of all subsets 
of [1 : p] of size at most d. We suppose that 

r/(X) « ^(X) = £ rfi(X), with V R = £ C°C(X U ) G 

ueS d ««G X [l:Li] 

In the following, we give the convergence properties of the estimator fj R to r] R , with 
fj R (X) := Y, i B (Xu), with ^(X u ) = £ ft>t )a (X u ) G Gg. 

«GS d i u 6 X [1:1*] 

where x [i^i, m G S 1 are estimated by the minimization problem (JSj) - Thus, 

we are interested in the convergence results when the ANOVA order d and the order 
of truncation L u = Y\i£ U (u G S d ), are fixed. 
Proposition [2] gives the convergence result. 

Proposition 2. Assume that 

Y = r, R (X) + e, where r, R (X) = £ £ ^°C(X U ) G tf°' L 

with E(e) = 0, E(e 2 ) = a 2 , E(e • ^„( x u)) = 0, V Z u G x [1 : LJ, V«eS d . 

** «Gt( 

((3 = (/3"'°)j u ,u G O is i/te true parameter). 

Further, let us consider the least-squares estimation fj R of r] using the sample 
(f/ s ,x s ) se [i :n ] and the functions (<pi v «)i v , u , that is 



f) R (X) = £ ^( X u), where fj R (X u ) = £ 4>IJX U ) G G°^ n 

u£S d l u e x [l:Li] 

where f3 = Argmmp^® ||Y — X v /3|| 2 . If we assume that 
(H) The distribution Px is equivalent to ® p i=l Px i; 
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then 

\\fj R - n R \\ ^4' when n — >• +oo. (10) 

The sketch of proof of Proposition [2] is postponed to Appendix A. For further inter- 
est, the detailed proof of Proposition [2] is postponed to Appendix B, as the annex 
of this document. 

Our aim here is to study how the approximating spaces Gu,n, constructed with 
the previous procedure, behave when n — > +oo. However, we assume that the 
ANOVA order d and the order of truncation L u (u £ S d ) are fixed. By extending 
the work of Stone |27| . Huang |15j explores the convergence properties of functional 
ANOVA models when d and L u are not fixed anymore. Nevertheless, the results are 
obtained for a general model space H u , and its approximating space G u , V u S S. 
In [15j . the author states that if the basis functions are m-smooth and bounded, 
||?7 — rj\\ converges in probability. For polynomials, Fourier transforms or splines, he 

2m 

specifically shows that ||r) — n\\ = O p (n 2m+d) (s ee [Jg p. 257). Thus, to get a good 
rate of convergence, it is in our interest to have a small order d in a model. In the 
next numerical applications, we use this theoretical result to substitute the initial 
model to a functional ANOVA of order at most d = 3. 

6. APPLICATION 

In this section, we consider two numerical applications. The first model is a toy 
function studied in Li et al. [18] . It is used to study the numerical properties of the 
practical method summarized in Section 14.31 The last application is the study of a 
shell subject to an internal pressure. From a finite elements code, we estimate the 
generalized sensitivity indices of input parameters implied into the model. The goal 
is to quantify the sensitivity of each input variable in a context of dependence. 
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6.1. A Test Case 

Remind that the HOGS procedure is used to construct basis functions adapted to 
the hierarchical orthogonality constraints. Hence, the same method improved by 
the adaptive greedy algorithm will be abbreviated GHOGS method (G for Greedy). 
The method improved by the modified LARS algorithm will be called LHOGS (L 
for LARS). 

The estimation procedure suggested in [9], and reminded in the introduction, is 
compared to the G/LHOGS methods. It will be denoted POM (for Projection 
Operators Method), relatively to the projection operators it uses. 
Let X ~ N(0, E) and the model function, 



with 



Y = g 1 (X 1 ,X 2 )+g 2 (X 2 )+g 3 (X 3 



g 1 (X 1 ,X 2 ) = [aiX x + a ][b 1 X 2 + b 
92(X 2 ) = c 2 X 2 +ciX 2 + c 



g 3 (X 3 ) = dzXi + dzXl + dtXs + do 



and 



a\ ^aia 2 ' 







V 



4} 



"i(J\0 2 



Condition (|C.2p does not allow to use the normal distribution, but rather the mixture 
Gaussian one [9]. However, the Gaussian distribution allows for computing a HOFD 
decomposition, as done in [18]. Moreover, if the research of solutions is restricted 
to the polynomial spaces, the uniqueness of the HOFD components given in [18] 
is ensured, whatever the type of distribution. Thus, the analytical form of the 
generalized Sobol indices can be deduced in this case. 

We take ao = c\ = do = 1, a\ = bo = c 2 = d\ = d 2 = 2 and b\ = cq = d 3 = 3. 
The variations are fixed at a± = <r 2 = 0.2, 03 = 0.18 and 7 = 0.6. To show 
the interest of the greedy and Lasso application, we proceed to n = 200 model 
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evaluations repeated 50 times. For each component, we choose a Hermite basis 
of degree 10. Thus, the number of parameters m = 330 > n = 200. In view of 
analytical results given in [T5], it is clear that only 8 among 330 basis functions 
are necessary to restitute entirely the model. Table [1] helps for comparing the POM 
with the GHOGS/LHOGS procedures on the sensitivity indices estimation and their 
standard deviation (indicated into brackets). Table [T] also provides the number of 
nonzero estimated coefficients for the FoBa and the modified LARS algorithm. The 
estimated components fji, 7)2 and 7)3 are represented in Figure [TJ 





Si 


s 2 


S 3 


S12 


S13 


S23 


S123 


E,i(/Wo) 


Analytical 


0.4429 


0.4718 


0.0763 


0.0091 













POM 


0.4402 
(0.021) 


0.4718 
(0.0401) 


0.0810 
(0.0012) 


-0.0014 
(0.001) 










GHOGS 


0.4499 
(0.0272) 


0.4647 
(0.0328) 


0.0754 
(0.0249) 


0.0030 
(0.0032) 




(0) 




(0) 


0.0070 
(0.0024) 


5 to 7 


LHOGS 


0.4534 
(0.0275) 


0.4688 
(0.0314) 


0.0793 
(0.0258) 


0.0060 
(0.0024) 


0.0013 
(0.0012) 


0.0010 
(0.0012) 


-0.0098 
(0.0002) 


22 to 207 



Table 1: Sensitivity indices estimation with the POM and the G/LHOGS methods 
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o Greedy 
o Lasso 








0.5 



Figure 1: Estimation of the HOFD components by the G/LHOGS methods 

The advantage of the GHOGS and of LHOGS methods is to be able to estimate all 
interaction indices, whereas the POM only estimates interaction indices involved in 
dependent pairs [9] . Even if many of the nonzero coefficients in LARS are close to 
zero, this method tends to estimate a large number of nonzero parameters in com- 
parison with the GHOGS procedure. Through this example, the greedy restitutes 
properly the information with a relevant selection of coefficients. 



6.2. The Tank Pressure Model 

The case study concerns a shell closed by a cap and subject to an internal pressure. 
Figure [2] illustrates a simulation of tank distortion. We are interested in the von 
Mises stress [3D] on the point y labeled in Figure The von Mises stress allows 
for predicting material yielding which occurs when it reaches the material yield 
strength. The selected point y corresponds to the point for which the von Mises 
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stress is maximal in the tank. Therefore, we want to prevent the tank from material 
damage induced by plastic deformations. To offer a large panel of tanks able to 
resist to the internal pressure, a manufacturer wants to know the most contributive 
parameters to the von Mises criterion variability. In the model we propose, the von 
Mises criterion depends on three geometrical parameters: the shell internal radius 
(Rint), the shell thickness (T s h e ii), and the cap thickness (T cap ). It also depends on 
five physical parameters concerning the Young's modulus (E s h e ii and E cap ) and the 
yield strength ((Jy )S heii an( A o yfiap ) of the shell and the cap. The last parameter is 
the internal pressure (Pint) applied to the shell. The system is modelized by a 2D 
finite elements code ASTER. 




Figure 2: Tank distortion at point y 



In table [21 we give the input distributions. 
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Inputs 


Distribution 


Pshell 
T 

± cap 


WQ1800; 2200]), j(R in t,T shell ) = 0.85 
U{ [360; 440]), j(T shell , T cap ) = 0.3 
W([180;220]), 7(^,^ = 0.3 


Pcap 
@y,cap 


a = 0.02, fi = 


( \ 
210 


*,£) + ( 


1 - a)JV(/i 

/ \ 

350 


,n) 

/ 

, o = 

I 


\ 

175 81 

81 417 

v. / 




Eshell 
^y, shell 


a = 0.02, fi = 


aN(f. 

( \ 
70 

1 300 


t,S) + ( 

, s = 


1 - a)7V(^ 
/ 

117 
500 


,n) 
, n = 

I 


/ N 

58 37 
37 250 




Pint 


iV(80,10) 



Table 2: Description of inputs of the shell model 



The geometrical parameters are uniformly distributed because of the large choice 
left for the tank building. The correlation 7 between the geometrical parameters 
is induced by the constraints of manufacturing processes. The physical inputs are 
normally distributed and their uncertainty are due to the manufacturing process and 
the properties of the elementary constituents variabilities. The large variability of 
Pint in the model corresponds to the different internal pressure values which could 
be applied to the shell by the user. 

To measure the contribution of the correlated inputs to the output variability, we es- 
timate the generalized sensitivity indices by the practical method exposed in Section 
[H We proceed to n = 1000 simulations over 50 runs. We use the 5-spline functions 
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for the geometrical parameters and the Hermite basis functions of degree 7 for the 
physical parameters. The first order indices dispersions are displayed in Figure [3] for 
both Greedy and LARS algorithm. 



Greedy algorithm LARS algorithm 



o 




Rint T she ii T cap E she n Ogheii E cap c cap P int R int T sne u T cap E sne u c sne u E cap o cap p int 

Sensitivity indices Sensitivity indices 



Figure 3: Boxplot representations of the first order sensitivity indices 

We observe first that the HOGS procedure applied with the greedy and the LARS 
techniques give very similar results. Once again, we do not observe a big difference 
between the two variable selection methods. 

The first four physical parameters are independent from the other inputs, and their 
effects are null, so we can deduce that they do not have any influence in the model. 
Also, even if the internal pressure plays an important role, the strongest contribution 
comes from the correlated set of geometrical inputs (Ri n t,T s heii,T cap ). The sensitiv- 
ity index related to the shell radius (Rint) is negative, so the covariance induced by 
the dependence dominates in the index, showing that either there is a strong nega- 
tive covariance part or the full contribution of the variable is small. In the first case, 
it shows that Ri n t is influent through its correlation. In the second one, the input 
Rint is not an influent variable in the model. The sensitivity indices of the shell 
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thickness i^T shell) and the cap thickness (T cap ) reveal that these two variables have a 
strong influence in the model. Thus, to scale down the variability in the model, we 
should reduce the cap thickness variability first. Because of the strong correlation 
between the shell radius and its thickness, one should reduce the variability of both 
parameters. 

APPENDIX A : CONVERGENCE RESULTS 

For sake of clarity, we first recall and define some notation that will be used further. 
Settings 

First, as mentioned in Section [U we assume that Y is centered. Also, we assume 
that a functional ANOVA of order d, with d < p, substitutes properly the initial 
model. Let us denote S d the collection of all subsets of [1 : p] of size at most d, 
where the empty set has been removed. Thus, S d = {u, C u C [1 : p] and \u\ < d}. 
Recall that, V i € [1 : p], Li := L^ is the dimension of the spaces H"' and G^. 
More generally, <f>u\ := 4>i- Also, L u := Ylieu is the dimension of the spaces Hu' L 
and Gu,n- 

For u G S d , l u = {li) i£ U is a multi-index of x [1 : Li], where x [1 : Li] is the 
Cartesian product of the sets [1 : Li], for i 6 u. 

We refer (<^Jz„ 6 x [i:L<] as the basis of Hl' L and {<pf u Ji ue x [i-.l,] as the basis of 
Gu,n constructed according to HOGS Procedure of Section [3l Thus, these functions 
all lie in L|(K?, B(W), Px). 

(•, •) and ||-|| are used as the inner product and norm on L^(R P , B(W), Px) ; 



(hi,h 2 ) = j /ii(x)/i 2 (x)p x di y (x), \\h\\ =(h,h) 

while (•, ■)„ and ||-|| n denote the empirical inner product and norm, that is 

1 - 

(5i,32) n = -Y]ffi(x s )52(x s ), \\g\\l = (g,g) n 

s=l 

when x s ) s= i j ... jn is the n-sample of observations from the distribution of (Y, X). 
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We set m := ^ues d L u the number of parameters in the regression model. Denote, 
for all u G S d , $ U (X U ) G (L 2 (R, B(R), P X ))S with ($ u (X u )) iu = <^(X U ), and by 
(3 any vector of 9 C M m , where (/3)j U;U = /3f , V l u € x [1 : Lj. 
Recall that, for a,b G N*, 7W a> (,(IR) denotes the set of all real matrices with a 
rows and b columns. Set X v = (<£> ... ^ ...J G x A4 n: L u (M.), where, for 

^ ' u€S d 

u G S d , (<p u ) s ,i u = ^ un (x u s ), V s G [1 : n], V l u G x [1 : Lj]. Also, we set 
= fci £ 2 •••) € x M n , Lu (R), where, for « G S d , (<U) sU = </£(x u s ), 
V s G [1 : n], V l u G x [1 : L»]. 

Denote by be the mxm Gramian matrix whose block entries are (E(<I> U (X U )'$„(X V ))) U ve s d - 
At last, we define the functions 

M n (f3) = ||Y - X^PWl (11) 

where Y s = y s , V s G [1 : n]. At last, the Euclidean norm will be denoted ||-|| 2 . 

Proposition 2. Assume that 

Y = r, R (X) + e, where r, R (X) = £ E /^°C(X U ) G #°> L 

«S5 d « u e x [l:Li] 



with E(e) = ; E(e 2 ) = a 2 , E(e • 0" (X u )) = 0, V l u G x [1 : LJ, V u G S d . 
((3 = {P^f)i u . u is the true parameter). 

Further, let us consider the least-squares estimation fj R of i] R using the sample 
(y s ,x s ) s e{i-.n] and the functions {<~pi^)i UyU , that is 



n 



l (X) = Y, ^( X u), where f, R (X u ) = E A>L, n (X u ) G G°£ 



uGS d i u e X [l:Li] 



where (3 = Arg minp^Q M n (/3) . If we assume that 
(H) The distribution Px is equivalent to ® p i=1 Pxi, 
then 

\\fj R - rj R \\ °4' 0, w/ien n — > +oo. (12) 

The proof of Proposition [2] is broken up into Lemmas [l](5j. To prove that (|12p . we 
introduce fj R as the following approximation of rj R , 

n R = E ^( x u) = E E AK, n (x u ), 

«e5 d uG5 d «„G X [1:L 4 ] 
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and we write the triangular inequality, 

\\f/ R - rj R \\ = \\f) R - fj R + fj R - r] R \\ < \\f) R - fj R \\ + \\fj R - i] R \\ (13) 

Thus, it is enough to prove that \\r) R — fj R \\ ^4' 0, and that \\fj R — n R || ^4' 0. 
Lemmas 0] and [5] deal with convergence results on \\fj R — rj R \\ and on 1 1 — fj R \\, 
respectively. Lemmas [H El [3] are preliminary results to prove Lemmas H] and [5j 
Lemmas [TJO are enunciated further below. 



Preliminary results 

Lemma 1. If (H) holds, then is a non singular matrix. 

Lemma 2. Let u,v 6 S d . Assume that Wtp) 1 — (bf II ^4' and \\ip? — 
for any l u £ x [1 : Lj] , l v £ x [1 : Lj] . Then, the following results hold: 



(i) 



and 



W a4 ' (CO and <C^U"^ <C>C 



Lemma 3. Let u £ 5 . -For any Zj, G [1 : Lj], assume that ||<^ n — 
Moreover, we also assume that there exists » £ S and Z„ € x [1 : Lj 



0. Then, we have 



0; 



(n) (^eu^^lj^ (U.eu^K)- 
Main convergence results 

Lemma 4. Remind that the true regression function is 

^(X) = ^i*u), where ^(X u ) = £ ^°C(X U ) € fl^ 



»7 



i„S x [l:Lj] 



Further, let rj R be the approximation of rj H , 



R 



-R 



(X) = J2 ^(*u), where fj R (X u ) = £ ^V? ttJ „(X«) € 
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Then \\fj* - r) R \\ a -4-' V u € S d , and \\fj R - n R \\ Q -4' 0. 



Lemma 5. Recall that (3 = ArgmiiLp e Q M n ((3). If (H) holds, then 



P-f3 a 4'0 



(14) 



Moreover, 



fj -rj 



R\\ a -l 



no. 



(15) 
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